library(tidyverse)
library(readxl)
library(statnet)
library(kableExtra)
library(GGally)
library(RColorBrewer)
library(grid)
library(gridExtra)
library(gtable)
library(ggpubr)The idea for this project came to my mind when I was about to be late for work. In order to get there, I had to transition from one tram line to another at the main station. But the other tram did not arrive - and so did none other going in the same direction. There was some malfunction within the railway control center, that prevented my tram and every other tram on that particular track from entering the station.
The railway network (for the tram/subway) of the city I used to work in has the shape of a star with one central hub station. The main station is right next to that central hub station, so every connection going north has to pass through the main station. All the northbound tram/subway lines pass through the main station on two tracks - two to three tram lines per track. If there is something wrong with one of the tracks at the main station, it will effectively cut off the all lines depending on that track (and all of their stops) from the rest of the network. Because of the structure of the railway network one can not simply take another route to his/her destination. There is none. Figure 1 illustrates this idea. If the red edge is severed, the nodes 2, 8, 9, and 10 are cut off from the rest of the network.
I could not get to work. Nobody could take a tram up north. None of the three lines on this route could enter the station. The platform was filling with people. As I was standing and waiting on an increasingly crowded platform, a thought crossed my mind: This would not have happened in my home town. Because of the way the railway network is laid out, the trains could have taken other routes between two points A and B - like it is shown in Figure 2. This network could have compensated for the loss of the red egde by diverting traffic over the alternative path [1->3->6->5->2]. Of course there would be delays, but traffic could still flow around disturbances within the railway network, while the network from Figure 1 would remain seperated until the damage was undone. In other words, the railway network in Figure 2 seems to be more resillient than the one in which I was trying to reach my workplace.
This is when I wondered how other railway networks would hold up. Which networks are more resillient? Which is the most vulnerable? I wanted to compare them. My focus would be on the railways, so this project is not about public transportation networks in general. A lot of traffic is happening by other means like monorail, cable cars, cog railways, boats and ferries and of course by busses. I will intentionally miss out on all of them, so my results will have to be taken with a grain of salt. Assume that I come to the conclusion that the network of a certain city is highly centralized, so that you cannot transition between terminal destinations without going back to a central hub, than this will be not accounted for bus traffic, which might in fact connect the terminal destinations. Even though people might be able to travel with ease between some points in reality, they will not be able to in my model without a connection by rails.
This decision is based on three reasons. First, I am interested in the resillience of networks. The vehicles need to be able to travel through the network. It is not realistic to assume that monorails, cable cars, boats, ferries and other vehicles can change to the rails trams/streetcars and subways use. Busses and other motor vehicles on the other hand can drive pretty much everywhere (except for trolleybusses), so the analysis would turn out to be meaningless. The second reason is, that rail-bound modes of public transportation play a vital role in pretty much all German cities. Finally, the third and more practical reason is, that I want to limit the amount of data I need to gather in order to realize this project. I do not intend to simulate a whole cities traffic. I want to know how unlucky you can get on your way to work in certain German cities, at least if you depend on rail-bound transportation for doing so.
There is an additional problem I need to address. There are three major types of vehicles on rails in German cities: trams, which are street cars, the U-Bahn, which is the subway/metro/tube and the S-Bahn. The S-Bahn is a German speciality, as I came to learn. It is a feature in many German city networks. Some of them connect to vast areas of urban hinterlands or even between cities. The Rhine-Ruhr region for example has a polycentric S-Bahn-network connecting over a dozen cities with each other. Including these connections would go beyond the scope of my research interest and increase the possibilty of absurd results. As a traveler within a city network, you could find yourself still technically connected to your destination via a detour over several cities and a large distance, even though a truly critical connection in your city has been severed.
So, how to deal with the S-Bahn then? Technically the S-Bahn belongs to the national railyway company (the Deutsche Bahn). This certainly is an argument for not counting them in, but they need to be included. For many cities, e.g. Berlin or Hamburg, the local traffic networks would simply cease to function (and in large part: to exist) if one would remove the S-Bahn. In order to deal with this, I made the rule to only include connections, whose terminal stations lie within the network plan. If S-Bahn lines go beyond the edge of the network plan and have only a few stops, I will consider them as interregional and not include them in the data set. This decision comes at a cost, but a compartively small one. Reality is messy.
To add to this problem, I encoutered some S-Bahn lines that went well beyond the network plan and had a large number of stops within the network. These I could not simply exclude. These cases were handled by including them and documenting their scope and the city they were connecting to via two additional variables. This sounds like an arbitrary decision, but it becomes clear when viewing the network plans. The interregional S-Bahn lines are usually drawn as thin grey lines in the background.
A similar problem arose with regional trains (RB - "Regionalbahn"). My data source are network plans, for the most part. In many cases, regional trains are included in those, but they are not really part of the local network. Regional train lines usually just fade off at the edges of the network plan, so my rule seemed to be able to deal with them effortlessly. Or so I thought - as it turned out during the encoding of the data, there are some examples of regional trains being completely within local traffic networks. Luckily, there are just a few. I included them into my data set, but will exclude them for analysis, since they cause a myriad of problems.
Working on this project often proved challenging, but also rewarding and surprisingly insightful. I learned a great deal about cities I have never seen before, just by gathering my data. Perhaps my most favourite of these secondary insights is, that Berlin has a Tram network - but only in one half of the city. This is of course rooted in the German division during the later half of the 20th century. There is a well known satellite foto from NASA astronaut Chris Hadfield, in which the former division of the city is still visible during night time. East Berlin used different light bulbs than West Berlin. It turns out that I now can make my own version of this, because while the Tram was abolished by West Berlin during the 1970s, it remained in East Berlin and does so until the present.
Left side: NASA / Chris Hadfield. Right side: my version with connections serviced by trams in red.
The local railway networks are conceptualized as graphs, consisting of nodes (the stations) and edges (the rails between them). I am interested in analyzing how well the different city networks deal with punctual stress, which I define as the loss of a single edge within the network. I decided against looking at the loss of a complete node, since this would constitute significantly heavier damage to the network. By focussing on an edge, I leave the possibility for an alternative route between its two adjacent nodes. Since I am interested in the fate of a hypothetical traveler within the network, this seems to be the more likely scenario.
In order to address this question, I can make some simplifications:
I will assume an undirected graph. There are very little unidirectional edges in the networks I have encountered so far. Since they will complicate things while adding not much insight, I have choosen to ignore them. They are however covered by a variable in my data set for potential later analysis.
For this first version, I made the decision to treat the types of tranportation (Tram, U-Bahn and S-Bahn) equally. It is not realistic to assume that the S-Bahn could just swap to the tracks of the Tram, but people might by walking from one to the other.
The graphs will be tested by deleting an edge at a time and measuring if a) parts of the network are seperated, and if they are, b) how big these cut off parts are. In terms of network analysis, I am interested in the number of components and their sizes. The bigger the cut off parts are, the higher the damage suffered by the network will be. A coefficient will be calculated in order to measure the damage by dividing the size of the cut off part by half the network size. Since all networks will experience at least some seperations (especially near the terminal stations, where stations usually are arranged like pearls on a string), I will calculate a second coefficient by multiplying the first one with the weight on an edge. This way, I compensate for the somewhat artificial cut off at the fringes of the network, since long stretching arms into the hinterlands of a city are usually not critical for the network as a whole. The weight of an edge is calculated as the mean of the betweenness-centralities of its two adjacent nodes.
The data are collected encoding by the network plans/maps of local traffic networks. Where possible, I will webscrape this data, but in most cases it is encoded manually. The data are in form of an edgelist with some additional information. Each connection between two stations is encoded for each train line servicing between them. Although I will not use this information for this current project, this is done to achieve a greater depth of information while I am at it and to ensure some kind of failsave against errors (each line has to be complete in the end). My source data look like this:
| ID.Source | Name.Source | ID.Target | Name.Target | Linie | Color | City | Type | Service | Direktionalität | Scope | Scope_connects_to | last_checked |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H0001 | Wettbergen | H0002 | Tresckowstr. | 3 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
| H0002 | Tresckowstr. | H0003 | Mühlenberger Markt | 3 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
| H0003 | Mühlenberger Markt | H0004 | Am Sauerwinkel | 3 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
| H0004 | Am Sauerwinkel | H0005 | Bartold-Knaust-Str. | 3 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
| H0005 | Bartold-Knaust-Str. | H0006 | Wallensteinstr. | 3 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
| H0001 | Wettbergen | H0002 | Tresckowstr. | 7 | blue | Hannover | Tram | 1 | bidirektional | internal | NA | pre 2020-11-30 |
My sample consists of German cities with railway networks that conform the the minimum size of 100.000 inhabitants, which is the requirement to be counted as a large city (ger.; "Großstadt""). According to this list this encompasses 81 cities. Since data collection takes time, I decided to write this project with a smaller sample and to add the other cities as I encode them. As of this version, I have data on 12 cities, which includes the biggest ones like Berlin, Munich, Cologne and Hamburg. All code is written in a manner that it can deal with additional data effortlessly later on. Since I wrote the data myself, I do not have to worry about missing values, which is nice.
Even though I wrote the data myself, I cannot use them without performing some tranformations first. The following functions will perform them. In my R-project they are saved in a seperate R-script file and run in a sort of a main function. In the following code chunks, I will introduce them one by one.
The following steps include:
# Determine distinct cities in the data set and save them in a vector.
nvn.makeSplitString <- function(df, splitCol = "City"){
splitString <- unique(df[[splitCol]])
return(splitString)
}
# Split data set by splitString and return a list.
nvn.splitdf <- function(df, splitString, splitCol = "City"){
list.df <- list()
for(i in seq_along(splitString)){
name <- paste0("df.", splitString[i])
df2 <- df[
df[[splitCol]] == splitString[i],
]
name <- assign(name, df2)
list.df[[i]] <- name
}
return(list.df)
}Next I need to extract the information on the nodes and the edges. The pattern for this is always the same. Frist, I write a function that takes a data frame and extracts the information. Then I write a second function, which will apply the first one to the list objects in which I store my data frames. Let's get the nodes first.
# Take a data frame, get a list of all unique nodes and return them as a data frame.
nvn.getNodes <- function(df, ID.Source = ID.Source, ID.Target = ID.Target,
Name.Source = Name.Source, Name.Target = Name.Target){
nodes.sourceID <- c(df[[ID.Source]], df[[ID.Target]])
nodes.sourceNames <- c(df[[Name.Source]], df[[Name.Target]])
nodes.tempDf <- tibble(nodes.sourceID, nodes.sourceNames)
df.nvnNodes <- distinct(nodes.tempDf, .keep_all = T)
names(df.nvnNodes)[1] <- "ID.Node"
names(df.nvnNodes)[2] <- "Name.Node"
return(df.nvnNodes)
}
# Applies nvn.getNodes to a list of data frames and returns a list of data frames.
nvn.cityNodes <- function(cityList = "cityList", splitString, ID.Source = "ID.Source",
ID.Target = "ID.Target", Name.Source = "Name.Source",
Name.Target = "Name.Target"){
list.nodes <- list()
for(i in seq_along(splitString)){
df <- cityList[[i]]
temp <- nvn.getNodes(df, ID.Source = ID.Source, ID.Target = ID.Target,
Name.Source = Name.Source, Name.Target = Name.Target)
list.nodes[[i]] <- temp
}
return(list.nodes)
}The resulting data frames are a simple list of all nodes with their ID's and names. They look like this:
| ID.Node | Name.Node |
|---|---|
| H0001 | Wettbergen |
| H0002 | Tresckowstr. |
| H0003 | Mühlenberger Markt |
| H0004 | Am Sauerwinkel |
| H0005 | Bartold-Knaust-Str. |
| H0006 | Wallensteinstr. |
Next, the edges.
# Take a data frame, get all unique combinations of Source and Target, return them as a data frame
# and keep all other information.
nvn.getEdges <- function(df, ID.Source = ID.Source, ID.Target = ID.Target){
edges.spread <- df %>%
select(ID.Source, ID.Target, everything())
df.nvnEdges <- edges.spread %>%
group_by(ID.Source, ID.Target) %>%
summarise(Weight = n())
return(df.nvnEdges)
}
# Applies nvn.getEdges to a list of data frames and returns a list of data frames.
nvn.cityEdges <- function(cityList = "cityList", splitString, ID.Source = "ID.Source",
ID.Target = "ID.Target"){
list.edges <- list()
for(i in seq_along(splitString)){
df <- cityList[[i]]
temp <- nvn.getEdges(df, ID.Source = ID.Source, ID.Target = ID.Target)
temp$City <- rep(splitString[i], nrow(temp))
list.edges[[i]] <- temp
}
return(list.edges)
}The resulting data frames are edge lists with added weights. The weights correspond to the number of train lines servicing this edge. They look like this:
| ID.Source | ID.Target | Weight | City |
|---|---|---|---|
| H0001 | H0002 | 2 | Hannover |
| H0002 | H0003 | 2 | Hannover |
| H0003 | H0004 | 2 | Hannover |
| H0004 | H0005 | 2 | Hannover |
| H0005 | H0006 | 2 | Hannover |
| H0006 | H0007 | 3 | Hannover |
Next I have to do some wrangling with the list.edges, since it is not in the desired state at this stage. There are two problems with it: The first one is pretty straightforward. Since I encoded the data for each train line, there are a many redundant edges in the data set, which need to be filtered. The second problem is related to the somewhat chaotic way the data was encoded. While dealing with the network plans, I usually followed the train lines through the network. This bears the possibility, that I encoded the same edge between two nodes A and B twice: once as [A,B] and once as [B,A]. I call these edges inverts. Inverts cannot be detected by looking for distinct edges, since they are distinct.
Both operations are necessary to ensure that the data set for analysis only contains only one edge per pair of nodes. If it does not, my results would be invalid, since the deletion of an edge (where there is an invert) would not result in a seperation of the network.
# Take a list of data frames, drop all non-distinct edges for each data frame and return a list of data frames.
nvn.distinctEdges <- function(list, ID.Source = "ID.Source", ID.Target = "ID.Target"){
list2 <- list
for(i in seq_along(list2)){
list2[[i]] <- list[[i]] %>% distinct(ID.Source, ID.Target, .keep_all = TRUE)
}
return(list2)
}
# Take a data frame, detect inverts, remove them and return a data frame.
nvn.rmInverts <- function(df, col1, col2){
df2 <- df
df2$discard <- rep(FALSE, nrow(df2))
for(i in 1:nrow(df2)){
val1 <- df2[[i, col1]]
val2 <- df2[[i, col2]]
for(j in 1:nrow(df2)){
if((df2[[i, "discard"]] == FALSE) &
(df2[[j,col2]] == val1) & (df2[[j, col1]] == val2)){
df2[j, "discard"] <- TRUE
}
}
}
df2 <- df2[df2$discard == FALSE,]
df2 <- df2[,colnames(df2) != "discard"]
return(df2)
}
# Applies nvn.rmInverts to a list of data frames and returns a list of data frames.
nvn.rmInvertsOnList <- function(list, col1 = "ID.Source", col2 = "ID.Target"){
list2 <- list
for(i in seq_along(list2)){
list2[[i]] <- nvn.rmInverts(list[[i]], col1 = col1, col2 = col2)
}
return(list2)
}The next function calcultes a weight called bCO for each edge in a network. The coefficient is calculated as the mean in-betweeness-centrality of the edges two nodes. This weight is intended to accordingly reflect the importance of a particular edge for the whole network. Its value increases with the number of possible paths going through the edge. The second function simply applies the first to the list objects used in this project.
# Take a data frame containing an edge list and calculate the betweenness-centrality of an edge as a mean.
nvn.betweennessEdges <- function(net, df.edges, node1 = node1, node2 = node2, rescale = FALSE){
df.centrality <- data.frame(Names = network.vertex.names(net),
val = betweenness(net, rescale = rescale))
vec.out <- double()
for(i in 1:nrow(df.edges)){
src <- as.character(df.edges[i, node1])
tar <- as.character(df.edges[i, node2])
vec.out[i] <- (df.centrality[df.centrality$Names == src, 2] +
df.centrality[df.centrality$Names == tar, 2])/2
}
return(vec.out)
}
# Applies nvn.betweennessEdgesList to a list of data frames and returns a list of data frames.
nvn.betweennessEdgesList <- function(net, list, node1 = "ID.Source", node2 = "ID.Target", rescale = FALSE){
list2 <- list
for(i in seq_along(list2)){
list2[[i]]$bCO <- nvn.betweennessEdges(net[[i]], list[[i]], node1 = node1,
node2 = node2, rescale = rescale)
}
return(list2)
}Lastly, I need to import my data and run the functions just written. With the data in the right form, I can create the network objects needed for the analysis. This is done mapping the statnet::network function over the list.edges.undirected object.
# Data import and data wrangling.
input_nvn <- "data/Nahverkehrsnetzwerke - V5.csv"
df.nvnSource <- read_csv2(input_nvn)
df.nvnSource <- df.nvnSource[df.nvnSource$Type %in% c("Tram", "S-Bahn", "U-Bahn") & df.nvnSource$Service == 1,]
splitString <- nvn.makeSplitString(df.nvnSource)
cityList <- nvn.splitdf(df.nvnSource, splitString)
list.nodes <- nvn.cityNodes(cityList, splitString)
list.edges <- nvn.cityEdges(cityList, splitString)
list.edges.undirected <- nvn.distinctEdges(list.edges)
list.edges.undirected <- nvn.rmInvertsOnList(list.edges.undirected)
# Creation of network-objects.
list.nets <- map(list.edges.undirected, network, directed = FALSE, matrix.type = "edgelist")
# Addition of betweeness-centrality based coefficient for edges.
list.edges.undirected <- nvn.betweennessEdgesList(list.nets, list.edges.undirected, rescale = TRUE)Bridges are edges that increase the number of components in a network when they are removed. Identifying the edges that are bridges within a local railway network means finding the networks weak points. Removing these edges damages the network by cutting off parts of it. By measuring the size of the cut-off components, the damage suffered by the network can be quantified. I do this with two coefficients. The first coefficient compares the size of the cut off components with the total network size divided by half. I will call this coefficient CO. Since I consider the smaller component as the part that is cut off from the rest of the network, the maximum size of the cut off part equals half the network size. A value of 1 can be intepreted as the theoretical maximum damage inflicted to a network by the severance of a single edge, while 0 indicates that nothing was cut off. The advantage of the coefficient CO is that it can be compared across different networks, but it is sensitive to cut-offs in the fringe areas of the network.
A second coefficient named wCO is introduced to deal with cut offs at the fringes of the network by simply multiplying the first coefficient with an edge weight. Initially, I was using the number of lines on an edge as a simplistic weight. But I replaced it later with the slightly more sophisticated weight bCO, which is derived from the well in network analysis established measure of in-betweeness-centrality. This emphasizes the importance of central edges and decreases the impact of the fringe areas, but it comes at the cost of not having a fixed maximum value for this coefficient. The coefficient wCO cannot be compared across different networks, but it is useful to assess the damage to a particular network more precisely. Later on, when visualizing the networks, I am using the mean of wCO + k*standard deviations for the color palette to indicate the most vulnerable edges of the networks.
# Take a network object, delete an edge and check if its number of components increases. If it does, measure the size of the cut-off.
nvn.bridgeImpact <- function(net){
c.cnt <- components(net)
c1.size <- rep(network.size(net), network.edgecount(net))
c.rise <- rep(0, network.edgecount(net))
c2.size <- rep(0, network.edgecount(net))
for(i in 1:network.edgecount(net)){
net2 <- net
net2 <- network::delete.edges(net2, i)
c2.cnt <- sna::components(net2)
if(c.cnt < c2.cnt){
c.rise[i] <- 1
comp <- component.dist(net2)
c2.size[i] <- min(comp$csize)
}
}
cut.offRel <- c2.size/(c1.size*0.5)
df.result <- data.frame(c.rise, c1.size, c2.size, cut.offRel)
return(df.result)
}
# Applies nvn.bridgeImpact to a list of data frames and returns a list of data frames.
nvn.getBridgeImpact <- function(list.edg, list.net){
list2 <- list.edg
for(i in seq_along(list2)){
list2[[i]] <- cbind(as.data.frame(list2[[i]]), nvn.bridgeImpact(list.net[[i]]))
}
return(list2)
}
# Adds a calculated column to the data frames of list.results.
nvn.weightedCutOffs <- function(list, col1 = "Weight", col2 = "cut.offRel"){
list2 <- list
for(i in seq_along(list2)){
list2[[i]]$weightedCutOff <- unlist(list2[[i]][,col1] * list2[[i]][,col2])
}
return(list2)
}Now all the pieces are in place to run the main calculation. First, a data frame containing the data on the cut-offs for each network is created and saved into a list. In the next step the cut offs are weighted. The resulting data frames for each network look like the example in the table below.
# Run nvn.bridgeImpact for all network objects and save the results
list.results <- nvn.getBridgeImpact(list.edges.undirected, list.nets)
# Calculate weighted coefficient by running nvn.weightedCutOffs for the results.
list.results <- nvn.weightedCutOffs(list.results, col1 = "bCO")At this point, the main calculations are done and the question of the vulnerability of local railway networks could be answered. But that would be boring and ugly, at least at this point. In order to compare the city networks, I want to do two things: firstly, the results will be shown in a simple table, since this is the most concise and comprehensive way to compare things. Secondly, I want to show each network as a graph together with information about the network and on its top n most critical edges. When I do this, I also want to provide an easy way of displaying the critical edges by coloring them. Additionally, I want to display the proper names of the stations, because they will not be recognisable by their ID's only. In short, some more coding is needed. Let's get to it.
All my data are currently saved in data frames within list-objects. I need to unnest them in order to compare them within one table. I wrote a helper-function to gather all the individual data frames into one table which is then aggregated on the city-level. The result will be a comprehensive table which is named df.summary. It displays the key measures for the city railway networks. In a second step, I also round the values within df.summary and list.results for increased readability.
# Helper function for gathering data frames from list.results.
nvn.gatherResults <- function(list){
for(i in seq_along(list)){
if(i == 1){
df <- list[[i]]
} else {
df <- rbind(df, list[[i]])
}
}
return(df)
}
# Gathering the data frames from list.results.
df.results <- nvn.gatherResults(list.results)
# Aggregation of measures per city.
df.summary <- df.results %>%
group_by(City) %>%
summarise(network.size = max(c1.size),
c2.size_mean = mean(c2.size),
c2.size_max = max(c2.size),
CO_mean = mean(cut.offRel),
CO_sd = sd(cut.offRel),
CO_max = max(cut.offRel),
wCO_mean = mean(weightedCutOff),
wCO_sd = sd(weightedCutOff),
wCO_max = max(weightedCutOff)) %>%
arrange(desc(wCO_mean)) %>%
add_column(. , Rank = 1:nrow(.), .before = "City")
# Function taken from https://stackoverflow.com/questions/29875914/rounding-values-in-a-dataframe-in-r
nvn.round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
df.summary <- nvn.round_df(df.summary, 4)
list.results <- lapply(list.results, nvn.round_df, 4)With this code, the table for the results is ready. It will be shown in the next chapter along the other results.
The preperations for the display of the network graph with additional information are a bit more extensive. Since this will be done for each individual network, the following functions are designed to work with my list-objects. They will loop through them and gather information on the top n edges and on the real world names of the stations. Also, I need some code for the visualization of the graph and its colorization. In the end, I am putting all the functions in a loop for procedural display of the results for all networks contained in my list-objects. This might not be the most elegant approach, but it gets the job done and makes the addition of new networks effortless.
The following function defines the colors for the most vulnerable edges of a network. I decided to base this decision on the distribution of the weight based on betweeness-centrality of the two nodes of a given edge. The coloring scheme encompases 7 steps of colors ranging from yellow to dark red. Each step has the width of one standard deviation. Since the coloring shall show only vulnerabilites, every edge below the mean of the distribution will be simply grey. This way it should be obvious to the eye where the problematic edges lie in the network. The drawback to this solution is that even the most robust networks will have some edges in warning colors.
nvn.makeColGradient <- function(vec){
vec2 <- rep("gray73", length(vec))
for (i in seq_along(vec)){
if (vec[i] >= mean(vec) & vec[i] < sd(vec)){
vec2[i] <- "yellow2"
} else if (vec[i] >= sd(vec) & vec[i] < sd(vec)*2){
vec2[i] <- "orange1"
} else if (vec[i] >= sd(vec)*2 &
vec[i] < sd(vec)*3){
vec2[i] <- "orange3"
} else if (vec[i] >= sd(vec)*3 &
vec[i] < sd(vec)*4) {
vec2[i] <- "orangered1"
} else if (vec[i] >= sd(vec)*4 &
vec[i] < sd(vec)*5) {
vec2[i] <- "red1"
} else if (vec[i] >= sd(vec)*5 &
vec[i] < sd(vec)*6) {
vec2[i] <- "red3"
} else if (vec[i] >= sd(vec)*6 ){
vec2[i] <- "red4"
}
}
return(vec2)
}Next, I wrote two simple functions that look up the top n edges in regard to the two coefficients CO and wCO.
nvn.topNcutOffs1 <- function(list, listindex, n, col1 = "ID.Source", col2 = "ID.Target", col3 = "cut.offRel"){
df <- list[[listindex]]
df <- df[order(df[,col3], decreasing = TRUE),]
df2 <- df[1:n, c(col1, col2, col3)]
return(df2)
}
nvn.topNcutOffs2 <- function(list, listindex, n, col1 = "ID.Source", col2 = "ID.Target", col3 = "weightedCutOff"){
df <- list[[listindex]]
df <- df[order(df[,col3], decreasing = TRUE),]
df2 <- df[1:n, c(col1, col2, col3)]
return(df2)
}The nodes in my network objects are identified by an ID, which is incomprehensible to the reader (or me). I intend to show some of the connections in the network explicitly. In order to provide a real-world reference, a function for looking up the names of the nodes is needed. I included 3 modes for displaying the information: "add" - get station names as an extra column; "replace" - get station names instead of the ID's; "concat" - display ID and station name together.
## nvn.lookUpNames
nvn.lookUpNames <- function(df, listindex, mode = "add", trunc_by = 20){
df2 <- list.nodes[[listindex]]
if(mode == "add"){
df <- left_join(df, df2, by = c("ID.Source" = "ID.Node"))
colnames(df)[ncol(df)] <- "Name.Source"
df <- left_join(df, df2, by = c("ID.Target" = "ID.Node"))
colnames(df)[ncol(df)] <- "Name.Target"
df <- df[,c(1,4,2,5,3)]
df[,c(1,4,2,5,3)] <- map(df[,c(1,4,2,5,3)], str_trunc, trunc_by, side = "center", ellipsis = "...")
} else if (mode == "replace"){
df <- left_join(df, df2, by = c("ID.Source" = "ID.Node"))
colnames(df)[ncol(df)] <- "Name.Source"
df <- left_join(df, df2, by = c("ID.Target" = "ID.Node"))
colnames(df)[ncol(df)] <- "Name.Target"
df <- df[,3:ncol(df)]
df <- df[,c(2,3,1)]
df[,c(2,3,1)] <- map(df[,c(2,3,1)], str_trunc, trunc_by, side = "center", ellipsis = "...")
} else if (mode == "concat"){
df <- left_join(df, df2, by = c("ID.Source" = "ID.Node"))
df$Source <- str_c(df[["ID.Source"]], df[["Name.Node"]], sep = " ")
df <- df[,-(ncol(df)-1)]
df <- left_join(df, df2, by = c("ID.Target" = "ID.Node"))
df$Target <- str_c(df[["ID.Target"]], df[["Name.Node"]], sep = " ")
df <- df[,c(3,4,6)]
df <- df[,c(2,3,1)]
df[,c(2,3,1)] <- map(df[,c(2,3,1)], str_trunc, trunc_by, side = "center", ellipsis = "...")
}
return(df)
}As a last prearrangement, I wrote two functions for sorting my list-objects. Since the sorting shall be a ranking that reflects the vulnerability of the networks, it can only be implemented after results are known. At this point, the table df.summary is already sorted by the coefficient wCO, while the list-objects are still in their original order. The two functions below create a sort index based on df.summary and apply it to all relevant objects. As a result, the networks can now be presented from the worst to be best (in terms of vulnerability).
## nvn.sortIndex
nvn.sortIndex <- function(list, df){
oldList <- integer()
newList <- integer()
for(i in seq_along(list)){
oldList <- c(oldList, i)
temp <- unique((list[[i]]$City))
index <- df[df$City == temp, 1]
newList <- c(newList, as.integer(index))
}
df2 <- data.frame(newList, oldList)
return(df2)
}
##nvn.sortList
nvn.sortList <- function(list, df.sort){
list2 <- list
for(i in seq_along(list)){
list2[[df.sort[i,1]]] <- list[[i]]
}
return(list2)
}
sort_by <- nvn.sortIndex(list.results, df.summary)
list.results <- nvn.sortList(list.results, sort_by)
list.nets <- nvn.sortList(list.nets, sort_by)
list.nodes <- nvn.sortList(list.nodes, sort_by)
list.edges <- nvn.sortList(list.edges, sort_by)
list.edges.undirected <- nvn.sortList(list.edges.undirected, sort_by)Now, everything is in place for the presentation of the results for each network in my data set.
The code below generates several graphical objects, which are aligned in a grid. The outcome will be demonstrated shortly.
for(i in seq_along(list.results)){
# Step 1: Create network graph.
g <- list.nets[[i]]
g.nodeLabels <- list.nodes[[i]]
g.nodeLabels <- dplyr::arrange(g.nodeLabels, ID.Node)
g.edgeAttr <- list.results[[i]]
g.edgeAttr$col <- nvn.makeColGradient(g.edgeAttr$weightedCutOff)
g.edgeAttr$size <- ifelse(g.edgeAttr$col == "gray83", 1, 1.5)
g.edgeAttr$size <- ifelse(g.edgeAttr$col == "gray83", .5, .8)
g1 <- ggnet2(g, node.color = "gray49", node.size = .5, edge.color = g.edgeAttr$col,
edge.size = g.edgeAttr$size, mode = "kamadakawai", layout.par = list(niter = 5000)) +
ggtitle(paste0("Rank ", which(df.summary$City == unique(g.edgeAttr$City)),
": ", g.edgeAttr$City)) +
theme(plot.title = element_text(size = 20, face = "bold"))
# Step 2: Create two tables with the top n network edges with regard to each of the
# two coefficients (CO/wCO).
listindex <- i
t.padding <- 36
tbl1 <- nvn.topNcutOffs1(list.results, listindex, n = 5)
tbl2 <- nvn.topNcutOffs2(list.results, listindex, n = 5)
tbl1 <- nvn.lookUpNames(tbl1, listindex, mode = "replace")
tbl2 <- nvn.lookUpNames(tbl2, listindex, mode = "replace")
colnames(tbl1) <- c(str_pad("Source", t.padding, side = "both", pad = " "),
str_pad("Target", t.padding, side = "both", pad = " "),
str_pad("CO", t.padding/4, side = "right", pad = " "))
colnames(tbl2) <- c(str_pad("Source", t.padding, side = "both", pad = " "),
str_pad("Target", t.padding, side = "both", pad = " "),
str_pad("wCO", t.padding/4, side = "right", pad = " "))
tbl1 <- ggtexttable(tbl1, rows = NULL, theme = ttheme(
colnames.style = colnames_style(
color = "black", fill = "#ffffff",
linewidth = 1, linecolor = "black"
),
tbody.style = tbody_style(color = "black",
fill = c("#dcdcdc", "#ffffff"), hjust=0, x=0.05)
))
tbl2 <- ggtexttable(tbl2, rows = NULL, theme = ttheme(
colnames.style = colnames_style(
color = "black", fill = "#ffffff",
linewidth = 1, linecolor = "black"
),
tbody.style = tbody_style(color = "black",
fill = c("#dcdcdc", "#ffffff"), hjust=0, x=0.05)
))
# Step 3: Create six graphical objects to highlight measures.
pos <- .5
dim <- .85
cap.cex <- 1
num.cex <- 1.3
vjust1 <- -1.35
vjust2 <- 1
col.fill <- "#dcdcdc"
tile1 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("Network Size", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,3], vjust = vjust2, gp = gpar(cex = num.cex)))
tile2 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("Max cut-off", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,5], vjust = vjust2, gp = gpar(cex = num.cex)))
tile3 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("CO max", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,8], vjust = vjust2, gp = gpar(cex = num.cex)))
tile4 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("CO mean", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,6], vjust = vjust2, gp = gpar(cex = num.cex)))
tile5 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("wCO max", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,11], vjust = vjust2, gp = gpar(cex = num.cex)))
tile6 <- grobTree(rectGrob(pos,pos,dim,dim, gp = gpar(fill = col.fill)),
textGrob("wCO mean", vjust = vjust1, gp = gpar(cex = cap.cex, fontface = "bold")),
textGrob(df.summary[listindex,9], vjust = vjust2, gp = gpar(cex = num.cex)))
# Step 4: Pull it all together in one graphical object.
spacer <- textGrob(" ")
print(ggarrange(spacer,
ggarrange(g1,
ggarrange(
ggarrange(
tile1, tile3, tile5, tile2, tile4, tile6, ncol = 3, nrow = 2), tbl1, tbl2,
ncol = 1, nrow = 3),
ncol = 2, widths = c(2,1.5)
), nrow = 2, heights = c(1, 30)
)
)
}The results will be shown in the next chapter in tabular form and in the detailed view, which the code above produces for each network. But before we start, I want to demonstrate what we are about to see with one example. Let's take a look at the city of Hannover, which is displayed below.
What can we see here?
This is the dashboard-style format I have chosen for the presentation of the results. Basically, the code above puts it all together as a single picture, so that the same kind of output can be generated procedurally. At the time of writing this, I still have to gather data on a number of networks. My approach allows it to add them later on by not much more than a button-press.
Everything is in place now. My sample is still incomplete, but lets have a look. First, the tabular form of the results. Cities are being ranked by the value of wCO. The size of the networks and the parts that are cut off are measured in the number of nodes contained within them. The table also shows the mean and maximum values for both of the coefficients and their standard deviations.
| Rank | City | network size | mean cut off | max cut off | mean CO | sd CO | max CO | mean wCO | sd wCO | max wCO |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Potsdam | 62 | 3.6567 | 24 | 0.1180 | 0.1468 | 0.7742 | 0.0037 | 0.0081 | 0.0570 |
| 2 | Jena | 48 | 2.5400 | 12 | 0.1058 | 0.1417 | 0.5000 | 0.0035 | 0.0066 | 0.0315 |
| 3 | Bochum/Gelsenkirchen | 188 | 12.2381 | 69 | 0.1302 | 0.1897 | 0.7340 | 0.0015 | 0.0034 | 0.0153 |
| 4 | Hannover | 195 | 8.6904 | 71 | 0.0891 | 0.1034 | 0.7282 | 0.0010 | 0.0026 | 0.0260 |
| 5 | Halle | 144 | 3.8354 | 34 | 0.0533 | 0.1118 | 0.4722 | 0.0010 | 0.0029 | 0.0169 |
| 6 | Erfurt | 102 | 2.7500 | 15 | 0.0539 | 0.0741 | 0.2941 | 0.0009 | 0.0019 | 0.0110 |
| 7 | Hamburg | 154 | 3.1988 | 16 | 0.0415 | 0.0516 | 0.2078 | 0.0004 | 0.0007 | 0.0050 |
| 8 | Köln | 213 | 3.8253 | 41 | 0.0359 | 0.0502 | 0.3850 | 0.0003 | 0.0008 | 0.0089 |
| 9 | Dresden | 256 | 3.7774 | 37 | 0.0295 | 0.0535 | 0.2891 | 0.0002 | 0.0006 | 0.0042 |
| 10 | Stuttgart | 271 | 2.1575 | 19 | 0.0159 | 0.0262 | 0.1402 | 0.0001 | 0.0002 | 0.0014 |
| 11 | München | 379 | 2.4360 | 21 | 0.0129 | 0.0193 | 0.1108 | 0.0000 | 0.0001 | 0.0016 |
| 12 | Berlin | 634 | 1.0921 | 18 | 0.0034 | 0.0076 | 0.0568 | 0.0000 | 0.0000 | 0.0003 |
The higher the rank, the higher the vulnerability of the network. The current champion of vulnerability is the city of Potsdam. The result matches my real world knowledge of the place. The most probelmatic edge in the network is close to a single (and quite steep) bridge right next to the main station. If something happens there, the city will be split into two (in terms of public transportation). The city I was being in late for work ranks also quiete high, but not as high as I would have expected.
Now, the more detailed results for the networks, ranked by vulnerability.
At the moment the data are still incomplete (at this point I have 13 out of 80 cities covered; 13 because the two cities of Bochum and Gelsenkirchen share a network). I will add them in the future the achieve a more accurate understanding. It is quite possible, that there a correlations between the size of the cities/networks and the robustnessness of the networks. But in order to investigate this any further, I am going to need more data.
Another issues that will probably arise is indicated by the network of Halle and Bochum/Gelsenkirchen. They are in fact two city-networks connected by a thin line (Halle is connected to neighboring Merseburg), which somewhat artificially increases their vulnerability. I might have to adress this issue more spohisticated measures. The coefficient wCO is designed to deal with this problem, but it is clearly insufficient in its current state.
This is the first release of this project in the spirit of a minimum vialble product. I intend to add data and to pursue it further in the future.